---
title: The Latent Influence of Ideology on Partisan Media Consumption
authors:
  - name: Matthew E. Dardet
    department: Department of Government
    affiliation: Harvard University
    location: Cambridge, MA 02138
    email: matthewdardet@fas.harvard.edu
  - name: Ben TerMaat
    department: Department of Government
    affiliation: Harvard University
    location: Cambridge, MA 02138
    email: btermaat@g.harvard.edu
abstract: |
  
  We address the persistent debate in political science of whether or not voters are becoming more polarized, and consuming their news in echo chambers as a result of digital technologies. By disentangling ideology and partisan identification, we show that models of Americans' media consumption that incorporate measures of respondent \textit{ideological proclivities} can be less model dependent, explain more variance than their ideology-less counterparts, and shrink kernel-based overlap coefficients over ideologically stratified distributions. To detect for initial evidence of model dependence in models that lack an ideological component, we introduce a new, simple variance-parametric statistic---mean absolute uncertainty deviation (MAUD) and its scaled cousin, standardized mean absolute uncertainty deviation (SMAUD) uncertainty. Using these descriptive statistics, generalized information matrix (GIM) tests, and leave-one-covariate-out (LOCO) inference, we reanalyze high-quality data and research designs on media diets from Guess (2021) and Tyler, Grimmer, & Iyengar (2021). Finally, we find that using URL section names as proxies for classification of news articles explain slightly less variance in partisan media consumption, yet exhibit less model dependence compared to using a full logistic regression classifier, suggesting that the two methodologies  may not be measuring the exact same quantity.
  
keywords:
  - media diets
  - motivated reasoning
  - political polarization
  - machine learning
bibliography: references.bib
biblio-style: unsrt
output: rticles::arxiv_article
urlcolor: blue
fig.caption: yes
keep_tex: yes
header-includes: \usepackage{float} \usepackage{graphicx} \usepackage{longtable} \usepackage{rotating}
  \usepackage{hhline} \usepackage{dcolumn}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(echo = TRUE, fig.pos = "!H", out.extra = "")
options(xtable.comment = FALSE)

# rm(list = ls())

# Load standard package suite. 
library(easypackages)
packages <- c("bayestestR", "car", "caret", "caTools", "conformalInference", 
              "data.table", "DeclareDesign", "devtools", "ebal", "estimatr", "extrafont", 
              "fixest", "foreign", "ggpubr", "ggridges", "ggthemes", "glmnet", 
              "gtsummary", "haven", "hrbrthemes", "here", "kableExtra", "lars", "lfe", 
              "lmtest", "Matching", "matchMulti", "margins", "mlr3", "modeest", 
              "modelsummary", "mvtnorm", "overlap", "papeR", "plotROC", 
              "randomizr", "redist", "rgenoud","ri2", "rio", "rjson", "RobustSE", 
              "rticles", "rvest", "sandwich", "scales", "sf", 
              "spatstat", "stargazer", "tidyverse", "urltools", "xtable")

devtools::install_github(repo="ryantibs/conformal", subdir="conformalInference")

if(length(which(!packages %in% installed.packages())) > 0) {
  install.packages(packages[!packages %in% installed.packages()])
}

libraries(packages)

# Set working directory. 
setwd(here::here())

# Load helper functions. 
source("scripts/functions.R") 

# Read datasets. 
yg15 <- read_csv("input/Guess_2021/data2015.csv")
yg16 <- read_csv("input/Guess_2021/data2016.csv")
yg16mob <- read_csv("input/Guess_2021/data2016mobile.csv")
```

# Introduction

In recent years, there have been numerous and valuable contributions to the relatively nascent study of the relationship between politics and digital technologies. While addressing this important issue, however, researchers have yet to reach a consensus even on the question of if the introduction of digital technologies has altered the political landscape. Starting with a replication of Guess' 2021 paper, we find that prior studies of this topic may have suffered from model dependence, and that the manner in which this relationship has been approached might overlook a crucial component: the elasticity of the relationship between ideology and partisanship. We introduce new methodological approaches to address these crucial questions, and find that a model that incorporates both ideology and partisanship more accurately reflects the current state of politics in a digital world. 


## Conflicting Findings in American Media Diets

On the one hand, some scholars worry that citizens consuming the news through social media has created a uniquely siloed information landscape (see Pariser 2011; Shmargad and Klar 2020; TKTK). On the other, numerous others contend that such concerns over a siloed electorate are overblown (see Boxell et al. 2017; Sunstein 2017; Guess 2021; TKTK). While these previous analyses provide much insight, due to their methodology and sampling, they overlook the crucial differentiation of ideological types within partisan breakdowns.

TKTK Will add several more paragraphs on this TKTK

## Trends in Polarization

### Party or Ideology?

By introducing ideological categories within partisan breakdowns, we find that the overlap of news consumption among partisan ideologues is far smaller than previously understood and decreased markedly between 2015 and 2016. Rather than treating partisan breakdown as a substitute for an individual’s ideology, we find that although many Americans may not have an incredibly partisan media diet (Guess 2021), those who are the most ideological do, in fact, have increasingly disparate and highly polarized sources of information.

TKTK Will add several more paragraphs on this TKTK 

# Research Design and Data

## Measuring Media Consumption and Computing Partisan Slants

Following Guess (2021), we use domain alignment scores from Bakshy et al. (2015) and a logistic regression-based news classifier to determine the average media consumption of each of the respondents who agreed to allow the Wakoopa search bar to track their web-browsing histories. 

```{r, echo = FALSE, message = FALSE, warning = FALSE}
# Figure 1: Partisan Slant Distributions Using Guess (2021) Data and Methodology.

# 2015 distributions (portals and no portals).
plt.15.p <- yg15 %>% 
  filter(pid3 != "Other/Don't know") %>% 
  ggplot(aes(diet_mean_newspol, fill = pid3, color = pid3, weight = 
               weights/sum(weights))) + 
  geom_density(alpha = 0.5) +
  annotate(geom = "text", x = 0.75, y = 0.5, 
           label = "Republicans", size = 3, color = "firebrick1") +
  annotate(geom = "text", x = 0.2, y = 1.75, 
           label = "Independents", size = 3, color = "black") +
  annotate(geom = "text", x = -0.8, y = 1, 
           label = "Democrats", size = 3, color = "dodgerblue4") +
  scale_fill_grey() + 
  scale_color_manual(values = c("dodgerblue4",
                                "black",
                                "firebrick1")) +
  theme_classic() + 
  ggtitle("News/politics only (2015; weighted)") +
  labs(x = "Average media diet slant",
       y = "Desity",
       color = "Party",
       fill = "Party") +
  lims(x = c(-1, 1)) + 
  theme(legend.position = "none")

plt.15.np <- yg15 %>% 
  filter(pid3 != "Other/Don't know") %>% 
  ggplot(aes(diet_mean_noportals, fill = pid3, color = pid3, 
             weight = weights/sum(weights))) + 
  geom_density(alpha = 0.5) +
  annotate(geom = "text", x = 0.75, y = 0.5, 
           label = "Republicans", size = 3, color = "firebrick1") +
  annotate(geom = "text", x = 0.2, y = 1.75, 
           label = "Independents", size = 3, color = "black") +
  annotate(geom = "text", x = -0.8, y = 1, 
           label = "Democrats", size = 3, color = "dodgerblue4") +
  scale_fill_grey() + 
  scale_color_manual(values = c("dodgerblue4",
                                "black",
                                "firebrick1")) +
  theme_classic() + 
  ggtitle("News/politics only, portals excluded (2015; weighted)") +
  labs(x = "Average media diet slant",
       y = "Desity",
       color = "Party",
       fill = "Party") +
  lims(x = c(-1, 1)) + 
  theme(legend.position = "none")

# 2016 distributions (portals and no portals).
plt.16.p <- yg16 %>% 
  filter(pid3 != "Other/Don't know") %>% 
  ggplot(aes(diet_mean_newspol, fill = pid3, color = pid3, 
             weight = weight/sum(weight))) + 
  geom_density(alpha = 0.5) +
  annotate(geom = "text", x = 0.75, y = 0.65, 
           label = "Republicans", size = 3, color = "firebrick1") +
  annotate(geom = "text", x = 0.12, y = 1.75, 
           label = "Independents", size = 3, color = "black") +
  annotate(geom = "text", x = -0.8, y = 1, 
           label = "Democrats", size = 3, color = "dodgerblue4") +
  scale_fill_grey() + 
  scale_color_manual(values = c("dodgerblue4",
                                "black",
                                "firebrick1")) +
  theme_classic() + 
  ggtitle("News/politics only (2016; weighted)") +
  labs(x = "Average media diet slant",
       y = "Desity",
       color = "Party",
       fill = "Party") +
  lims(x = c(-1, 1)) + 
  theme(legend.position = "none")

plt.16.np <- yg16 %>% 
  filter(pid3 != "Other/Don't know") %>% 
  ggplot(aes(diet_mean_noportals, fill = pid3, color = pid3, 
             weight = weight/sum(weight))) + 
  geom_density(alpha = 0.5) +
  annotate(geom = "text", x = 0.75, y = 0.65, 
           label = "Republicans", size = 3, color = "firebrick1") +
  annotate(geom = "text", x = 0.05, y = 1.75, 
           label = "Independents", size = 3, color = "black") +
  annotate(geom = "text", x = -0.8, y = 1, 
           label = "Democrats", size = 3, color = "dodgerblue4") +
  scale_fill_grey() + 
  scale_color_manual(values = c("dodgerblue4",
                                "black",
                                "firebrick1")) +
  theme_classic() + 
  ggtitle("News/politics only, portals excluded (2016; weighted)") +
  labs(x = "Average media diet slant",
       y = "Desity",
       color = "Party",
       fill = "Party") +
  lims(x = c(-1, 1)) + 
  theme(legend.position = "none")

# Arrange, save, and output plot. 
plt.1 <- ggarrange(plt.15.p, plt.15.np,
                   plt.16.p, plt.16.np) %>% 
  annotate_figure(top = text_grob("Figure 1. Americans' Online Media Diets by Partisanship",
                  face = "bold",
                  size = 14))
ggsave("output/figures/figure_1.pdf", device = "pdf", width = 10, height = 8)
```
```{r, echo = F, fig.show="hold", out.width="100%", fig.align="center"}
# Load Figure 1 into the RMarkdown document. 
knitr::include_graphics("output/figures/figure_1.pdf")
```

As shown in Figure 1, we replicate Guess' finding that in both 2015 and 2016 the media diets of Americans overlap quite significantly across partisan identification. Following Guess, we consider the "overlapping coefficient," (Inman and Bradley Jr. 1989), with the formula

$1 - \frac {1}{2} \int_{-\infty}^{\infty} |f(x) - g(x)| dx.$


The 2015 results reflect an overlap coefficient of 0.63, meaning that the average media diet for both Democrats and Republicans is captured within 63% of the total distribution. Moving to the 2016 results, the overlap coefficient for Democrats and Republicans decreases to 0.46.

```{r, echo = FALSE}
# Table 1: Overlap coefficients for each 

# 2015 distributions (portals and no portals).
overlap.15.p.dr  <- overlap(x = yg15 %>% # Overlap between Democrats and Republicans.
                                filter(pid3 == "Democrat") %>% 
                                pull(diet_mean_newspol), 
                            y = yg15 %>% 
                                filter(pid3 == "Republican") %>% 
                                pull(diet_mean_newspol))
overlap.15.p.di <- overlap(x = yg15 %>% # Overlap between Democrats and Independents.
                               filter(pid3 == "Democrat") %>% 
                               pull(diet_mean_newspol), 
                           y = yg15 %>% 
                               filter(pid3 == "Independent") %>% 
                               pull(diet_mean_newspol))
overlap.15.p.ri <- overlap(x = yg15 %>% # Overlap between Republicans and Independents.
                               filter(pid3 == "Republican") %>% 
                               pull(diet_mean_newspol), 
                           y = yg15 %>% 
                               filter(pid3 == "Independent") %>% 
                               pull(diet_mean_newspol))
overlap.15.np.dr  <- overlap(x = yg15 %>% # Overlap between Democrats and Republicans.
                                 filter(pid3 == "Democrat") %>% 
                                 pull(diet_mean_noportals), 
                             y = yg15 %>% 
                                 filter(pid3 == "Republican") %>% 
                                 pull(diet_mean_noportals))
overlap.15.np.di <- overlap(x = yg15 %>% # Overlap between Democrats and Independents.
                                filter(pid3 == "Democrat") %>% 
                                pull(diet_mean_noportals), 
                            y = yg15 %>% 
                                filter(pid3 == "Independent") %>% 
                                pull(diet_mean_noportals))
overlap.15.np.ri <- overlap(x = yg15 %>% # Overlap between Republicans and Independents.
                                filter(pid3 == "Republican") %>% 
                                pull(diet_mean_noportals), 
                            y = yg15 %>% 
                                filter(pid3 == "Independent") %>% 
                                pull(diet_mean_noportals))

# 2016 distributions (portals and noportals).
overlap.16.p.dr  <- overlap(x = yg16 %>% # Overlap between Democrats and Republicans.
                                filter(pid3 == "Democrat") %>% 
                                pull(diet_mean_newspol), 
                            y = yg16 %>% 
                                filter(pid3 == "Republican") %>% 
                                pull(diet_mean_newspol))
overlap.16.p.di <- overlap(x = yg16 %>% # Overlap between Democrats and Independents.
                               filter(pid3 == "Democrat") %>% 
                               pull(diet_mean_newspol), 
                           y = yg16 %>% 
                               filter(pid3 == "Independent") %>% 
                               pull(diet_mean_newspol))
overlap.16.p.ri <- overlap(x = yg16 %>% # Overlap between Republicans and Independents.
                               filter(pid3 == "Republican") %>% 
                               pull(diet_mean_newspol), 
                           y = yg16 %>% 
                               filter(pid3 == "Independent") %>% 
                               pull(diet_mean_newspol))
overlap.16.np.dr  <- overlap(x = yg16 %>% # Overlap between Democrats and Republicans.
                                 filter(pid3 == "Democrat") %>% 
                                 pull(diet_mean_noportals), 
                             y = yg16 %>% 
                                 filter(pid3 == "Republican") %>% 
                                 pull(diet_mean_noportals))
overlap.16.np.di <- overlap(x = yg16 %>% # Overlap between Democrats and Independents.
                                filter(pid3 == "Democrat") %>% 
                                pull(diet_mean_noportals), 
                            y = yg16 %>% 
                                filter(pid3 == "Independent") %>% 
                                pull(diet_mean_noportals))
overlap.16.np.ri <- overlap(x = yg16 %>% # Overlap between Republicans and Independents.
                                filter(pid3 == "Republican") %>% 
                                pull(diet_mean_noportals), 
                            y = yg16 %>% 
                                filter(pid3 == "Independent") %>% 
                                pull(diet_mean_noportals))
oc.matrix <- matrix(data = c(overlap.15.p.dr, overlap.15.np.dr, overlap.16.p.dr, overlap.16.np.dr,
                             overlap.16.p.dr - overlap.15.p.dr, overlap.16.np.dr - overlap.15.np.dr, 
                             overlap.15.p.di, overlap.15.np.di, overlap.16.p.di, overlap.16.np.di,
                             overlap.16.p.di - overlap.15.p.di, overlap.16.np.di - overlap.15.np.di, 
                             overlap.15.p.ri, overlap.15.np.ri, overlap.16.p.ri, overlap.16.np.ri,
                             overlap.16.p.ri - overlap.15.p.ri, overlap.16.np.ri - overlap.15.np.ri) %>% 
                      round(digits = 3),
                    nrow = 3, 
                    ncol = 6,
                    byrow = TRUE,
                    dimnames = list(c("Dem-Rep", "Dem-Ind", "Rep-Ind"),
                                    c("2015", "2015 (no portals)", "2016", "2016 (no portals)", 
                                      "$\\Delta_{P}$", "$\\Delta_{NP}$")))
kable(oc.matrix,
      align=rep('c', 6),
      format = "latex",
      booktabs = T,
      escape = FALSE,
      caption = "\\textbf{Overlap Coefficients for Americans' Partisan Media Diets}") %>% 
      kable_styling(position = "center", 
                    latex_options = "HOLD_position")
```

[TODO: we find, after computing overlap coefficients, that the overlap in partisan media diets has considerably considerately in 2016 compared to 2015, as evidenced by the negative coefficients on the change in portal and nonportal diet alignments from 2015 to 2016. It is unclear whether this is a sign of a significant change in news consumption between 2015 and 2016 or a statistical artefact induced by changes in the survey and observational samples between the two years. We will return to this question later in the paper...]

```{r, echo = FALSE, results = 'asis'}
# Table 2: Original Regressions Predicting Partisan Media Consumption. 

# Recode the covariates. 
yg15$pid3 <- fct_relevel(yg15$pid3, "Other/Don't know")
yg15$ideo5 <- fct_relevel(yg15$ideo5, "Don't know")
yg15$race <- fct_relevel(yg15$race, "Other")
yg15$gender <- as_factor(yg15$gender)
yg15$educ <- as_factor(yg15$educ)
yg15$educ <- fct_relevel(yg15$educ, "Less than high school")
yg15$educ <- fct_relevel(yg15$educ, levels(yg15$educ)[c(1,4,2,5,3)])
yg16$pid3 <- fct_relevel(yg16$pid3, "Other/Don't know")
yg16$ideo5 <- fct_relevel(yg16$ideo5, "Don't know")
yg16$race <- fct_relevel(yg16$race, "Other")
yg16$gender <- as_factor(yg16$gender)
yg16$gender <- fct_relevel(yg16$gender, "Male")
yg16$educ <- as_factor(yg16$educ)
yg16$educ <- fct_relevel(yg16$educ, "Less than high school")
yg16$educ <- fct_relevel(yg16$educ, levels(yg16$educ)[c(1,4,2,5,3)])

# Run models. 
model.1.2015 <- lm(diet_mean_newspol ~ age + race + gender + income + educ + pid3,
                   data = yg15,
                   weights = weights)
model.1.2016 <- lm(diet_mean_newspol ~ age + race + gender + income + educ + pid3,
                   data = yg16, 
                   weights = weight)

# Sequester variable names.
varnames <- str_replace(str_replace(str_replace(str_replace(str_replace(
  str_replace(attr(summary(model.1.2015)$coefficients, "dimnames")[[1]], 
              "age", "Age: "), "race", "Race: "), "educ", ""), "pid3", ""), 
  "gender", ""), "income", "Income level")

# Output model results in stargazer table. 
stargazer(model.1.2015,
          model.1.2016, 
          style = "apsr", 
          title = "\\textbf{Determinants of Media Diet Slant}",
          dep.var.labels = "Average media diet slant (news/politics only)", 
          column.labels = c("\\emph{2015}", "\\emph{2016}"), 
          se = starprep(model.1.2015, model.1.2016),
          p = starprep(model.1.2015, model.1.2016, stat = "p.value"),
          covariate.labels = varnames[2:length(varnames)],
          omit.stat = c("f", "ser", "rsq"), 
          column.sep.width = "0pt", 
          star.cutoffs = c(.05, .01),
          align = TRUE,
          notes = c("OLS regressions with HC2 robust standard errors in", 
                    "parentheses; YouGov survey data with weights applied."),
          notes.append = TRUE,
          header = FALSE,
          float = TRUE,
          table.placement = "H")
```

## Model Dependence in Existing Models of Partisan Media Consumption

By applying a Generalized Information Matrix (GIM), we find evidence of model dependence in the existing analyses of partisan media consumption (King and Roberts 2015).
[TODO: introduce MAUD or SMAUD]
[TODO: summary of GIM Test]

```{r, echo = FALSE}
# Table 3: MAUDs, SMAUDs, and GIM Test results on original models. 

# Function to compute MAUD and SMAUD. Takes two vectors of coefficient standard 
# errors and a binary variable for standardization and outputs the desired
# summary statistic. 
maud.gen <- function(se.1, se.2, standardize = 0) {
  se.1 <- as.numeric(se.1)
  se.2 <- as.numeric(se.2)
  if (standardize) {
    statistic <- mean(abs(se.1-se.2)/se.1)
  } else {
    statistic <- mean(abs(se.1-se.2))
  }
  return(statistic)
}

# Compute MAUDs and SMAUDs for each model. 
maud.2015.1 <- maud.gen(starprep(model.1.2015)[[1]],
                        summary(model.1.2015)$coefficients[,2])
maud.2016.1 <- maud.gen(starprep(model.1.2016)[[1]],
                        summary(model.1.2016)$coefficients[,2])
smaud.2015.1 <- maud.gen(starprep(model.1.2015)[[1]],
                         summary(model.1.2015)$coefficients[,2], 1)
smaud.2016.1 <- maud.gen(starprep(model.1.2016)[[1]],
                         summary(model.1.2016)$coefficients[,2], 1)

# Transfer models to GLM form for incorporation in GIM Tests.
glm.1.2015 <- glm(diet_mean_newspol ~ age + race + gender + income + educ + pid3,
                  data = yg15,
                  weights = weights)
glm.1.2016 <- glm(diet_mean_newspol ~ age + race + gender + income + educ + pid3,
                  data = yg16, 
                  weights = weight)
```
```{r, eval = FALSE, echo = FALSE}
# Compute generalized information matrix (GIM) tests. 
GIM.1.2015 <- GIM(glm.1.2015, full = TRUE, B = 30, B2 = 25)
GIM.1.2016 <- GIM(glm.1.2016, full = TRUE, B = 30, B2 = 25)
```
```{r, echo = FALSE}
# Load tediously computed GIMs. 
load("input/GIM/GIM_1_2015.RData")
load("input/GIM/GIM_1_2016.RData")

# Function to extract desirable components of GIM object.
GIM.extract <- function(GIM.object) {
  rot <- GIM.object$`Rule of Thumb`
  tstat <- GIM.object$`GIM test statistic`
  pval <- GIM.object$`GIM pval`
  return(c(rot, 
           tstat,
           pval))
}

# Output results in table. 
dep.mat <- matrix(data = c(maud.2015.1, smaud.2015.1, GIM.extract(GIM.1.2015),
                           maud.2016.1, smaud.2016.1, GIM.extract(GIM.1.2016)) %>% 
                             round(digits = 3),
                  nrow = 2, 
                  ncol = 5, 
                  byrow = TRUE,
                  dimnames = list(c("2015", "2016"),
                                  c("MAUD", "SMAUD", "(GIM) Rule of Thumb", "Test Statistic", "\\textit{p}-value")))
kable(dep.mat,
      align = rep('c', 5),
      format = "latex",
      booktabs = T,
      escape = FALSE,
      caption = "\\textbf{Evidence for Model Dependence in Main Consumption Determinants Models}") %>% 
  kable_styling(position = "center", 
                latex_options = "HOLD_position")
```

## Considering Ideology Reduces Model Dependence and Shrinks Overlap Coefficients 

[TODO: history of partisan newspapers has largely been replaced ]

```{r, echo = FALSE, results = 'asis'}
# Table 4: Models that account for both partisanship and ideology. 

# Estimate different model specifications including . 
model.2.2015 <- lm(diet_mean_newspol ~ age + race + gender + income + educ + ideo5,
                   data = yg15,
                   weights = weights)
model.3.2015 <- lm(diet_mean_newspol ~ age + race + gender + income + educ + pid3 +
                     ideo5,
                   data = yg15,
                   weights = weights)
model.2.2016 <- lm(diet_mean_newspol ~ age + race + gender + income + educ + ideo5,
                   data = yg16, 
                   weights = weight)
model.3.2016 <- lm(diet_mean_newspol ~ age + race + gender + income + educ + pid3 +
                   ideo5,
                   data = yg16, 
                   weights = weight)

varnames <- str_replace(str_replace(str_replace(str_replace(str_replace(str_replace(
  str_replace(attr(summary(model.3.2015)$coefficients, "dimnames")[[1]], 
              "age", "Age: "), "race", "Race: "), "educ", ""), "pid3", ""), 
  "gender", ""), "income", "Income level"), "ideo5", "")

# Output model results in stargazer table. 
stargazer(model.1.2015,
          model.2.2015,
          model.3.2015,
          model.1.2016,
          model.2.2016,
          model.3.2016,
          style = "apsr", 
          title = "\\textbf{Determinants of Media Diet Slant (Including Ideology)}",
          dep.var.labels = "Average media diet slant (news/politics only)", 
          column.labels = c("\\emph{2015} (PID)", 
                            "\\emph{2015} (Ideo)",
                            "\\emph{2015} (Combined)",
                            "\\emph{2016} (PID)",
                            "\\emph{2016} (Ideo)",
                            "\\emph{2016} (Combined)"), 
          se = starprep(model.1.2015, model.2.2015, model.3.2015,
                        model.1.2016, model.2.2016, model.3.2016),
          p = starprep(model.1.2015, model.2.2015, model.3.2015,
                       model.1.2016, model.2.2016, model.3.2016,
                       stat = "p.value"),
          covariate.labels = varnames[2:length(varnames)],
          omit.stat = c("f", "ser", "rsq"), column.sep.width = "0pt", 
          star.cutoffs = c(.05, .01),
          align = TRUE,
          notes = c("OLS regressions with HC2 robust standard errors in parentheses; YouGov survey data with weights applied."),
          add.lines = list("",
                           c("Party ID", "\\checkmark", "", "\\checkmark", 
                             "\\checkmark", "", "\\checkmark"),
                           c("Ideology", "", "\\checkmark", "\\checkmark", 
                             "", "\\checkmark", "\\checkmark")),
          notes.append = TRUE,
          header = FALSE,
          float = TRUE,
          table.placement = "H",
          font.size = "small")
```

[TODO: Interpretation/explanation of big regression table.]

```{r, echo = FALSE}
# Compute MAUDs and SMAUDs for various model types. 
maud.2015.2 <- maud.gen(starprep(model.2.2015)[[1]], 
                        summary(model.2.2015)$coefficients[,2])
maud.2015.3 <- maud.gen(starprep(model.3.2015)[[1]],
                        summary(model.3.2015)$coefficients[,2])
maud.2016.2 <- maud.gen(starprep(model.2.2016)[[1]], 
                        summary(model.2.2016)$coefficients[,2])
maud.2016.3 <- maud.gen(starprep(model.3.2016)[[1]],
                        summary(model.3.2016)$coefficients[,2])
smaud.2015.2 <- maud.gen(starprep(model.2.2015)[[1]], 
                         summary(model.2.2015)$coefficients[,2], 1)
smaud.2015.3 <- maud.gen(starprep(model.3.2015)[[1]],
                         summary(model.3.2015)$coefficients[,2], 1)
smaud.2016.2 <- maud.gen(starprep(model.2.2016)[[1]], 
                         summary(model.2.2016)$coefficients[,2], 1)
smaud.2016.3 <- maud.gen(starprep(model.3.2016)[[1]],
                         summary(model.3.2016)$coefficients[,2], 1)
mauds <- c(maud.2015.1, maud.2015.2, maud.2015.3,
           maud.2016.1, maud.2016.2, maud.2016.3)
smauds <- c(smaud.2015.1, smaud.2015.2, smaud.2015.3,
            smaud.2016.1, smaud.2016.2, smaud.2016.3)

# Estimate GLM versions of models for inclusion in the GIM test functions. 
glm.2.2015 <- glm(diet_mean_newspol ~ age + race + gender + income + educ + ideo5,
                  data = yg15,
                  weights = weights)
glm.3.2015 <- glm(diet_mean_newspol ~ age + race + gender + income + educ + pid3 +
                     ideo5,
                  data = yg15,
                  weights = weights)
glm.2.2016 <- glm(diet_mean_newspol ~ age + race + gender + income + educ + ideo5,
                  data = yg16, 
                  weights = weight)
glm.3.2016 <- glm(diet_mean_newspol ~ age + race + gender + income + educ + pid3 +
                  ideo5,
                  data = yg16, 
                  weights = weight)
```

```{r, echo = FALSE, eval=FALSE}
# Compute GIM tests for above models. 
GIM.2.2015 <- GIM(glm.2.2015, full = TRUE, B = 30, B2 = 25)
GIM.3.2015 <- GIM(glm.3.2015, full = TRUE, B = 30, B2 = 25)
GIM.2.2016 <- GIM(glm.2.2016, full = TRUE, B = 30, B2 = 25)
GIM.3.2016 <- GIM(glm.3.2016, full = TRUE, B = 30, B2 = 25)
```

```{r, echo = FALSE}
# Figure 2. Decreasing Model Dependence Through Incorporation of Respondent Ideology

# Load tediously computed GIM test objects.  
load("input/GIM/GIM_2_2015.RData")
load("input/GIM/GIM_3_2015.RData")
load("input/GIM/GIM_2_2016.RData")
load("input/GIM/GIM_3_2016.RData")

# MAUD/SMAUD subfigures. 
unc.dat <- data.frame(maud = mauds, 
                     smaud = smauds, 
                     rot = c(GIM.1.2015$`Rule of Thumb`,
                             GIM.2.2015$`Rule of Thumb`,
                             GIM.3.2015$`Rule of Thumb`,
                             GIM.1.2016$`Rule of Thumb`,
                             GIM.2.2016$`Rule of Thumb`,
                             GIM.3.2016$`Rule of Thumb`),
                     model = rep(c("PID", "Ideo", "Combo"), 2),
                     year = c(rep(2015, 3),
                              rep(2016, 3)))

m.plt <- unc.dat %>% 
  ggplot(aes(x = maud, y = model, label = model)) +
  geom_point(size = 3, shape = 24, fill = "blue", color = "darkred") +
  geom_vline(xintercept = 0, lty = "dotted") + 
  facet_grid(~year) +
  labs(x = "MAUD",
       y = "Model", 
       title = "MAUDs by Model Specification") + 
  theme_pubr()

sm.plt <- unc.dat %>% 
  ggplot(aes(x = smaud, y = model, label = model)) +
  geom_point(size = 3, shape = 23, fill = "blue", color = "darkred") +
  geom_vline(xintercept = 0, lty = "dotted") + 
  facet_grid(~year) +
  labs(x = "SMAUD",
       y = "Model", 
       title = "SMAUDs by Model Specification") + 
  theme_pubr()
  
# GIM statistics subfigures. 
rot.plt <- unc.dat %>% 
  ggplot(aes(x = rot, y = model, label = model)) +
  geom_point(size = 3, shape = 22, fill = "blue", color = "darkred") +
  geom_vline(xintercept = 0, lty = "dotted") + 
  facet_grid(~year) +
  labs(x = "(GIM) Rule of Thumb",
       y = "Model", 
       title = "(GIM) Rules of Thumb by Model Specification") + 
  theme_pubr()

# Arrange, save, and output plot. 
plt.2 <- ggarrange(m.plt,
                   sm.plt,
                   rot.plt,
                   nrow = 3) %>% 
  annotate_figure(top = text_grob("Figure 2. Inclusion of Ideology Reduces Model Dependence",
                  face = "bold",
                  size = 14))
ggsave("output/figures/figure_2.pdf", device = "pdf", width = 8, height = 10)
```
```{r, echo = F, fig.show="hold", out.width="100%", fig.align="center"}
# Load Figure 1 into the RMarkdown document. 
knitr::include_graphics("output/figures/figure_2.pdf")
```

[Following Converse (1964), we stratify individuals into political ideologues, near ideologues, and true moderates.]

```{r, echo = FALSE, warning = FALSE, message = FALSE}
# Figure 3. Americans' Media Diets by Ideology.

# Mutate data.
yg15 <- yg15 %>% # 2015
  mutate(pol.class = case_when(pid3 == "Republican" & ideo5 == "Very conservative" ~ "Rep Ideologue", 
 pid3 == "Democrat" & ideo5 == "Very liberal" ~ "Dem Ideologue",
 pid3 == "Republican" & ideo5 == "Conservative" ~ "Rep Near Ideologue",
 pid3 == "Democrat" & ideo5 == "Liberal" ~ "Dem Near Ideologue",
 (ideo5 == "Moderate" | ideo5 == "Don't know") & (pid3 == "Democrat" | pid3 == "Republican") ~ "Moderate Partisan", 
 pid3 == "Independent" & ideo5 == "Moderate" ~ "True Moderate",
 TRUE ~ "No issue content")) %>% 
  mutate(pol.class = factor(pol.class))
yg15$pol.class <- fct_relevel(yg15$pol.class, "No issue content")
yg16 <- yg16 %>% # 2016
  mutate(pol.class = case_when(pid3 == "Republican" & ideo5 == "Very conservative" ~ "Rep Ideologue", 
 pid3 == "Democrat" & ideo5 == "Very liberal" ~ "Dem Ideologue",
 pid3 == "Republican" & ideo5 == "Conservative" ~ "Rep Near Ideologue",
 pid3 == "Democrat" & ideo5 == "Liberal" ~ "Dem Near Ideologue",
 (ideo5 == "Moderate" | ideo5 == "Don't know") & (pid3 == "Democrat" | pid3 == "Republican") ~ "Moderate Partisan", 
 pid3 == "Independent" & ideo5 == "Moderate" ~ "True Moderate",
 TRUE ~ "No issue content")) %>% 
  mutate(pol.class = factor(pol.class))
yg16$pol.class <- fct_relevel(yg16$pol.class, "No issue content")

# Compute ideological overlap coefficients.
overlap.1 <- overlap(x = yg15 %>% # Overlap between Democrats and Republicans.
                          filter(pol.class == "Dem Ideologue") %>%
                          pull(diet_mean_newspol),
                      y = yg15 %>%
                          filter(pol.class == "Rep Ideologue") %>%
                          pull(diet_mean_newspol)) %>% 
             round(digits = 2)
overlap.2 <- overlap(x = yg15 %>% # Overlap between Democrats and Republicans.
                          filter(pol.class == "Dem Near Ideologue") %>%
                          pull(diet_mean_newspol),
                      y = yg15 %>%
                          filter(pol.class == "Rep Near Ideologue") %>%
                          pull(diet_mean_newspol)) %>% 
             round(digits = 2)
overlap.3 <- overlap(x = yg16 %>% # Overlap between Democrats and Republicans.
                          filter(pol.class == "Dem Ideologue") %>%
                          pull(diet_mean_newspol),
                      y = yg16 %>%
                          filter(pol.class == "Rep Ideologue") %>%
                          pull(diet_mean_newspol)) %>% 
             round(digits = 2)
overlap.4 <- overlap(x = yg16 %>% # Overlap between Democrats and Republicans.
                          filter(pol.class == "Dem Near Ideologue") %>%
                          pull(diet_mean_newspol),
                      y = yg16 %>%
                          filter(pol.class == "Rep Near Ideologue") %>%
                          pull(diet_mean_newspol)) %>% 
             round(digits = 2)

# Generate plots.
plt.15.p.i <- yg15 %>%
  filter(pol.class == "Rep Ideologue" | pol.class == "Dem Ideologue") %>% 
  ggplot(aes(diet_mean_newspol, fill = pol.class, color = pol.class,
             weight = weights/sum(weights))) +
  geom_density(alpha = 0.5) +
  annotate(geom = "text", x = 0.6, y = 0.75,
           label = "Republicans", size = 3, color = "red") +
  annotate(geom = "text", x = -0.8, y = 1.5,
           label = "Democrats", size = 3, color = "blue") +
  annotate(geom = "text", x = 0.5, y = 1.5,
           label = paste("Overlap: ", overlap.1), size = 3) +
  scale_fill_grey() +
  scale_color_manual(values = c("blue",
                                "red")) +
  theme_classic() +
  ggtitle("Ideologues (2015; weighted)") +
  labs(x = "Average media diet slant",
       y = "Desity",
       color = "Ideological Class",
       fill = "Ideological Class") +
  lims(x = c(-1, 1)) +
  theme(legend.position = "none")

plt.15.p.ni <- yg15 %>%
  filter(pol.class == "Rep Near Ideologue" | pol.class == "Dem Near Ideologue") %>% 
  ggplot(aes(diet_mean_newspol, fill = pol.class, color = pol.class,
             weight = weights/sum(weights))) +
  geom_density(alpha = 0.5) +
  annotate(geom = "text", x = 0.6, y = 0.7,
           label = "Republicans", size = 3, color = "firebrick1") +
  annotate(geom = "text", x = -0.775, y = 1.3,
           label = "Democrats", size = 3, color = "dodgerblue4") +
  annotate(geom = "text", x = 0.5, y = 1.5,
           label = paste("Overlap: ", overlap.2), size = 3) +
  scale_fill_grey() +
  scale_color_manual(values = c("dodgerblue4",
                                "firebrick1")) +
  theme_classic() +
  ggtitle("Near Ideologues (2015; weighted)") +
  labs(x = "Average media diet slant",
       y = "Desity",
       color = "Ideological Class",
       fill = "Ideological Class") +
  lims(x = c(-1, 1)) +
  theme(legend.position = "none")

plt.15.p.m <- yg15 %>%
  filter(pol.class == "Moderate Partisan" | pol.class == "True Moderate" | pol.class == "No issue content") %>% 
  ggplot(aes(diet_mean_newspol, fill = pol.class, color = pol.class,
             weight = weights/sum(weights))) +
  geom_density(alpha = 0.5) +
  annotate(geom = "text", x = 0.35, y = 1.90,
           label = "Moderate Partisans", size = 3, color = "forestgreen") +
  annotate(geom = "text", x = -0.75, y = 1.5,
           label = "True Moderates", size = 3, color = "darkorchid4") +
  annotate(geom = "text", x = 0.75, y = 0.5,
           label = "No Issue Content", size = 3, color = "black") +
  scale_fill_grey() +
  scale_color_manual(values = c("black", 
                                "forestgreen",
                                "darkorchid4")) +
  theme_classic() +
  ggtitle("Moderates (2015; weighted)") +
  labs(x = "Average media diet slant",
       y = "Desity",
       color = "Ideological Class",
       fill = "Ideological Class") +
  lims(x = c(-1, 1)) +
  theme(legend.position = "none")

plt.16.p.i <- yg16 %>%
  filter(pol.class == "Rep Ideologue" | pol.class == "Dem Ideologue") %>% 
  ggplot(aes(diet_mean_newspol, fill = pol.class, color = pol.class,
             weight = weight/sum(weight))) +
  geom_density(alpha = 0.5) +
  annotate(geom = "text", x = 0.6, y = 1.05,
           label = "Republicans", size = 3, color = "red") +
  annotate(geom = "text", x = -0.65, y = 2,
           label = "Democrats", size = 3, color = "blue") +
  annotate(geom = "text", x = 0.5, y = 1.5,
           label = paste("Overlap: ", overlap.3), size = 3) +
  scale_fill_grey() +
  scale_color_manual(values = c("blue",
                                "red")) +
  theme_classic() +
  ggtitle("Ideologues (2016; weighted)") +
  labs(x = "Average media diet slant",
       y = "Desity",
       color = "Ideological Class",
       fill = "Ideological Class") +
  lims(x = c(-1, 1)) +
  theme(legend.position = "none")

plt.16.p.ni <- yg16 %>%
  filter(pol.class == "Rep Near Ideologue" | pol.class == "Dem Near Ideologue") %>% 
  ggplot(aes(diet_mean_newspol, fill = pol.class, color = pol.class,
             weight = weight/sum(weight))) +
  geom_density(alpha = 0.5) +
  annotate(geom = "text", x = 0.6, y = 0.7,
           label = "Republicans", size = 3, color = "firebrick1") +
  annotate(geom = "text", x = -0.7, y = 1.5,
           label = "Democrats", size = 3, color = "dodgerblue4") +
  annotate(geom = "text", x = 0.5, y = 1.5,
           label = paste("Overlap: ", overlap.4), size = 3) +
  scale_fill_grey() +
  scale_color_manual(values = c("dodgerblue4",
                                "firebrick1")) +
  theme_classic() +
  ggtitle("Near Ideologues (2016; weighted)") +
  labs(x = "Average media diet slant",
       y = "Desity",
       color = "Ideological Class",
       fill = "Ideological Class") +
  lims(x = c(-1, 1)) +
  theme(legend.position = "none")

plt.16.p.m <- yg16 %>%
  filter(pol.class == "Moderate Partisan" | pol.class == "True Moderate" | pol.class == "No issue content") %>% 
  ggplot(aes(diet_mean_newspol, fill = pol.class, color = pol.class,
             weight = weight/sum(weight))) +
  geom_density(alpha = 0.5) +
  annotate(geom = "text", x = 0.4, y = 1.90,
           label = "Moderate Partisans", size = 3, color = "forestgreen") +
  annotate(geom = "text", x = -0.65, y = 1.5,
           label = "True Moderates", size = 3, color = "darkorchid4") +
  annotate(geom = "text", x = 0.75, y = 0.5,
           label = "No Issue Content", size = 3, color = "black") +
  scale_fill_grey() +
  scale_color_manual(values = c("black", 
                                "forestgreen",
                                "darkorchid4")) +
  theme_classic() +
  ggtitle("Moderates (2016; weighted)") +
  labs(x = "Average media diet slant",
       y = "Desity",
       color = "Ideological Class",
       fill = "Ideological Class") +
  lims(x = c(-1, 1)) +
  theme(legend.position = "none")

# Arrange and save the ideological class plots. 
plt.ideotypes <- ggarrange(plt.15.p.i, plt.16.p.i,
                           plt.15.p.ni, plt.16.p.ni,
                           plt.15.p.m, plt.16.p.m,
                           nrow = 3,
                           ncol = 2) %>% 
  annotate_figure(top = text_grob("Figure 3. Americans' Media Diets by Ideological Type",
                  face = "bold",
                  size = 14))
ggsave("output/figures/figure_3.pdf", device = "pdf", width = 8, height = 11)

# Compute cross-tabs of respondent types. 
pc.2015 <- (sort(prop.table(table(yg15$pol.class))) * 100) %>% round(digits = 3)
pc.2016 <- (sort(prop.table(table(yg16$pol.class))) * 100) %>% round(digits = 3)
```

[Hump for "no issue content"... a right-wing "shy trump voter" affecting the low-information/education/cognitive ability voters?]

```{r, echo = F, fig.show="hold", out.width="100%", fig.align="center"}
# Output above plot. 
knitr::include_graphics("output/figures/figure_3.pdf")
```

[Cross-tables of class proportion.]
[2015:  Rep Ideologue      Dem Ideologue Rep Near Ideologue Dem Near Ideologue      True Moderate 
             7.112              8.980              9.124             11.638             13.649 
 Moderate Partisan   No issue content 
            18.032             31.466 ]
[2016: Rep Ideologue      Dem Ideologue Rep Near Ideologue      True Moderate Dem Near Ideologue 
             9.116             11.704             11.943             12.898             14.968 
 Moderate Partisan   No issue content 
            16.799             22.572 ]

[ALL GIM Tests return same $p$-value, 0.03225806, and seemingly arbitrary test statistic magnitudes, which may requite further investigation.]

## URL Section Name Proxies Perform Slightly Better than Logistic Classification of News Alignment

```{r, echo = FALSE, eval = FALSE}
# Implement  Tyler, Grimmer, & Iyenger's (2021) 2016 URL name proxy procedure.
urls <- read_csv("input/TGI_Data/news_domain_urls.csv") # Data from Peterson & Iyengar (2019).

urls <- urls %>% 
  select(-used_at) %>% 
  na.omit()
urls <- urls %>% 
  mutate(type = ifelse(type == "agregator", "portal", type))
urls %>% 
  pull(caseid) %>% 
  n_distinct()

parsed <- urls %>% 
  pull(url) %>% 
  urltools::url_parse() %>% 
  tibble()
suffix <- parsed %>% 
  pull(domain) %>% 
  urltools::suffix_extract() %>% 
  tibble()
urls <- bind_cols(urls %>% select(-domain, -path),
  parsed %>% 
  select(path), suffix) %>%
  select(-suffix, -subdomain, -host) %>% 
  mutate(path = if_else(is.na(path), "", path)) %>% 
  filter(!is.na(domain))

political_keywords <- read_csv("input/TGI_Data/PoliticalKeyWords.csv",
  col_names = FALSE)
political_keywords <- political_keywords %>%
  pull(X1) %>% str_c(collapse = "|")

urls <- urls %>% 
  mutate(political = 1 * str_detect(path, political_keywords),
  political = ifelse(is.na(political), 0, political))

urls <- urls %>% mutate(portal = 1 * (type == "portal"))

set.seed(pi + 4)

get_proper_domain <- function(x) {
  urltools::suffix_extract(urltools::url_parse(x)$domain)$domain
}

# Merge with survey data.
news <- urls

survey <- read_sav("input/TGI_Data/STAN0089_OUTPUT_all.sav")

survey <- survey %>%
  mutate(
    internet_primary = case_when(
    main_news_source_2_pre == 1 ~ 1,
    main_news_source_3_pre == 1 ~ 1,
    main_news_source_4_pre == 1 ~ 1,
    main_news_source_5_pre == 1 ~ 1,
    main_news_source_6_pre == 1 ~ 1,
    main_news_source_7_pre == 1 ~ 1,
    TRUE ~ 0)
    )

survey <- survey %>%
  mutate(
  caseid = as.integer(caseid),
  pid = pid7_pre %>% zap_labels() %>% na_if(8),
  party = case_when(
    pid %in% 1:3 ~ "dem",
    pid %in% 5:7 ~ "rep",
    TRUE ~ "ind") %>% as_factor(),
  income = faminc_pre %>% na_if(97),
  female = gender_pre,
  age = (2016 - birthyr_pre) %>% as.integer(), 
  educ = educ_pre,
  ideology = ideo5_pre %>% na_if(6),
  clinton = case_when(
    as_factor(presvote16_post) == "Hillary Clinton" ~ 1,
    TRUE ~ 0),
  trump = case_when(
    as_factor(presvote16_post) == "Donald Trump" ~ 1,
    TRUE ~ 0),
  voted_major = clinton + trump,
  abortion = Q26_pre,
  refugees = Q27_pre,
  primary = case_when(
    Q2_pre == 1 ~ 1,
    TRUE ~ 0),
  newsint = 1 * (Q31_pre == 1),
  race = case_when(
    race_pre == 1 ~ "White",
    race_pre == 2 ~ "Black",
    race_pre == 3 ~ "Hispanic",
    race_pre == 4 ~ "Asian",
    TRUE ~ "Other"
    ) %>% as_factor(),
  weight = weight_pulse %>% as.numeric(),
  ) %>% 
  select(caseid, pid, party, income, female, age, educ,
    ideology, clinton, trump, voted_major, refugees, abortion,
    primary, newsint, race, weight) %>% 
  as_factor() %>% 
  mutate_if(is.factor,  fct_drop)

news <- news %>% 
  left_join(survey, by = "caseid")

foo <- news %>% group_by(caseid) %>%
  summarise(pfrac = mean(portal)) %>% 
  left_join(survey, by = "caseid")

foo <- foo %>% group_by(pid) %>%
  summarise(m = mean(pfrac),
    se = sd(pfrac) / sqrt(n())) %>% 
  mutate(lwr = m - 2 * se, upr = m + 2 * se)

pid_levels <- c("Strong DEM", "Weak DEM", "Lean DEM",
  "Pure IND", "Lean REP", "Weak REP", "Strong REP")

foo <- foo %>% mutate(pid = factor(case_when(
  pid == 1 ~ "Strong DEM",
  pid == 2 ~ "Weak DEM",
  pid == 3 ~ "Lean DEM",
  pid == 4 ~ "Pure IND",
  pid == 5 ~ "Lean REP",
  pid == 6 ~ "Weak REP",
  pid == 7 ~ "Strong REP"),
  levels = pid_levels)) %>% 
  filter(!is.na(pid))

bakshy <- read_csv("input/TGI_Data/top500.csv") %>%
  select(domain, avg_align)
bakshy <- bakshy %>% mutate(
  domain = str_remove(domain, "www.")) %>% 
  distinct(domain, .keep_all = TRUE)

bakshy_domains <- bakshy$domain
proper_bakshy_domains <- bakshy$domain
bakshy_align <- bakshy$avg_align

find_bakshy_align <- function(url_list) {
  found <- map(url_list,
    ~ bakshy_domains[str_detect(., bakshy_domains)])
  for (i in 1:length(found)) {
    x <- found[[i]]
    actual_proper <- get_proper_domain(url_list[i])
    potential_proper <- get_proper_domain(x)
    found[[i]] <- x[potential_proper == actual_proper]
  }
  
  found <- map(found,
    ~ ifelse(identical(., character(0)), NA, .))
  found <- map_chr(found, ~
    ifelse(is.na(.), ., .[which.max(nchar(.))])) %>% 
    tibble(domain = .)
  found <- left_join(found, bakshy, by = "domain") %>%
    rename(bakshy_domain = domain)
  return(found)
}
```

```{r, echo = FALSE, eval = FALSE}
# Compute partisan alignments via URL section name proxies. 
# Warning: very computationally intensive (recommend running on AWS or just use
# the output I have already saved.)
news <- news %>% pull(url) %>% find_bakshy_align() %>%
  bind_cols(news, .)

news <- news %>% rename(avg_align = b_align)
saveRDS(news, file = "input/TGI_Data/TGI2021NewsMerge.rds")
```

```{r, echo = FALSE}
news <- readRDS("input/TGI_Data/TGI2021NewsMerge.rds")

just_domain <- function(x) suffix_extract(url_parse(x)$domain)$domain

alexa355 <- read.csv("input/TGI_Data/politicaldomains-final-edits.csv") %>%
  select(domain, type)
alexa355$domain <- just_domain(alexa355$domain)
alexa355 <- alexa355 %>% mutate(
  type = as.character(type),
  type = ifelse(type == "aggregator", "portal", type))

gdf <- news %>% select(caseid, avg_align, weight,
  age, race, female, income, ideology, newsint,
  educ, pid)

mean_align <- gdf %>% select(caseid, avg_align) %>%
  group_by(caseid) %>% 
  summarize(mean_align = mean(avg_align, na.rm = TRUE))

gdf <- mean_align %>%
  left_join(gdf %>% select(-avg_align) %>%distinct(),
    by = "caseid") %>% 
  distinct()

age_levels <- paste0("Age: ", c("Under 30", "30-44", "45-59", "60+"))
race_levels <- paste0("Race: ", c("Other", "White", "Black", "Hispanic"))
educ_levels <- paste0("Educ.: ", c("Some or no HS", "High school", "Some college", "College graduate", "Postgraduate"))
party_levels <- paste0("Party: ", c("Independent", "Democrat",
  "Republican"))
ideo_levels <- paste0("Ideo.: ", c("Moderate", "Liberal", "Conservative", "Very conservative", "Very liberal"))

inc_levels <- c("$10,000 - $19,999",
  "$20,000 - $29,999", "$50,000 - $59,999", 
  "$70,000 - $79,999",   "$30,000 - $39,999",
  "$120,000 - $149,999",
  "$80,000 - $99,999",   "$150,000 - $199,999", "$40,000 - $49,999",
  "$100,000 - $119,999", "$60,000 - $69,999",   "$150,000 or more",
  "Less than $10,000", "$350,000 - $499,999", "$200,000 - $249,999",
  "$250,000 - $349,999")

inc_levels <- inc_levels[c(13, 1, 2, 5, 9, 3, 11, 4,
  7, 10, 6, 8, 15, 16)]

ideo_levels <- c("Conservative", "Liberal", "Moderate",
  "Very conservative","Very liberal", "Don't know")
ideo_levels <- ideo_levels[c(6, 1, 2, 3, 4, 5)]

ideo5 <- addNA(gdf$ideology)
levels(ideo5) <- c(levels(gdf$ideology), "Don't know")
gdf$ideo5 <- ideo5
gdf <- gdf %>% mutate(
  Age = factor(case_when(
    age < 30 ~ age_levels[1],
    age >= 30 & age <= 45 ~ age_levels[2],
    age >= 45 & age <= 59 ~ age_levels[3],
    age >= 60 ~ age_levels[4]), levels = age_levels),
  Race = factor(case_when(
    race == "White" ~ race_levels[2],
    race == "Black" ~ race_levels[3],
    race == "Hispanic" ~ race_levels[4],
    TRUE ~ race_levels[1]), levels = race_levels),
  Educ = factor(case_when(
    educ == "No HS" ~ educ_levels[1],
    educ == "High school graduate" ~ educ_levels[2],
    educ %in% c("Some college", "2-year") ~ educ_levels[3],
    educ == "4-year" ~ educ_levels[4],
    educ == "Post-grad" ~ educ_levels[5]), levels = educ_levels),
  Party = factor(case_when(
    pid <= 3 ~ party_levels[2],
    pid == 4 ~ party_levels[1],
    pid >= 5 ~ party_levels[3]), levels = party_levels),
  Income = factor(income, levels = inc_levels),
  Income_Numeric = as.integer(Income),
  Ideology = factor(ideo5, levels = ideo_levels),
  Ideology_Numeric = as.integer(Ideology),
  Female = 1 * (female == "Female"),
)

# Tyler, Grimmer, and Iyengar (2021) methodology applied to Guess data. 
gnews <- readRDS("input/TGI_Data/GG2021NewsMerge.rds") # Guess (2021) data is confidential and the raw URL procedure has been omitted for data privacy reasons. 

mean_align <- gnews %>% select(id, avg_align) %>%
  group_by(id) %>% 
  summarize(mean_align = mean(avg_align, na.rm = TRUE))

gnews <- mean_align %>%
  left_join(gnews %>% select(-avg_align) %>% distinct(),
    by = "id") %>% 
  distinct()

# Tyler, Grimmer, and Iyengar (2021) expanded with more URL section name proxies
# and applied to Guess (2021) data.
hnews <- readRDS("input/TGI_Data/HG2021NewsMerge.rds")

mean_align <- hnews %>% select(id, avg_align) %>%
  group_by(id) %>% 
  summarize(mean_align = mean(avg_align, na.rm = TRUE))

hnews <- mean_align %>%
  left_join(hnews %>% select(-avg_align) %>% distinct(),
    by = "id") %>% 
  distinct()
```

```{r, echo = FALSE, warning = FALSE, message = FALSE}
# Figure 4. Americans' Online Media Diets (URL Section Proxy Method; 2016)

# Compute plots and  overlap coefficients. 
# TGI 2016. 
tgi.16 <- news %>% filter(!is.na(avg_align)) 
tgi.16$party <- recode(tgi.16$party, "dem" = "Democrat", "rep" = "Republican", "ind" = "Independent") 
tgi.16 <- tgi.16 %>%
  group_by(caseid, party) %>%
  summarize(mean_align = mean(avg_align)) %>% 
  ungroup()

rep <- tgi.16 %>% filter(party == "Republican") %>% pull(mean_align)
dem <- tgi.16 %>% filter(party == "Democrat") %>% pull(mean_align)
ind <- tgi.16 %>% filter(party == "Independent") %>% pull(mean_align)
tgi.16.dr <- overlap(dem, rep)
tgi.16.di <- overlap(dem, ind)
tgi.16.ri <- overlap(rep, ind)

plt.tgi.16 <- tgi.16 %>% 
  filter(!is.na(party)) %>% 
  ggplot(aes(mean_align, fill = party, color = party)) + 
  geom_density(alpha = 0.5) +
  annotate(geom = "text", x = 0.75, y = 0.5, 
           label = "Independents", size = 3, color = "black") +
  annotate(geom = "text", x = 0.25, y = 1.75, 
           label = "Republicans", size = 3, color = "firebrick1") +
  annotate(geom = "text", x = -0.8, y = 1, 
           label = "Democrats", size = 3, color = "dodgerblue4") +
  scale_fill_grey() + 
  scale_color_manual(values = c("dodgerblue4",
                                "firebrick1",
                                "black")) +
  theme_classic() + 
  ggtitle("Figure 4. Americans' Online Media Diets (URL Proxy)") +
  labs(x = "Average media diet slant",
       y = "Desity") +
  lims(x = c(-1, 1)) +
  theme(legend.position = "none",
        plot.title = element_text(size = 14, face = "bold"))
ggsave("output/figures/figure_4.pdf", device = "pdf", width = 6, height = 4)
```

```{r, echo = F, fig.show="hold", out.width="75%", fig.align="center"}
# Load Figure 4 into the RMarkdown document. 
knitr::include_graphics("output/figures/figure_4.pdf")
```

```{r, echo = FALSE, warning = FALSE, message = FALSE}
# Compute K-S tests and bootstrapped KS-tests of distributional equality. 
ks.overall <- ks.test(gdf$mean_align,
                      yg16$diet_mean_newspol) # Overall Distributions
ks.overall.boot <- ks.boot(gdf$mean_align,
                           yg16$diet_mean_newspol)
ks.dem <- ks.test(gdf$mean_align[gdf$Party == "Party: Democrat"],
        yg16$diet_mean_newspol[yg16$pid3 == "Democrat"]) # Democrat Distributions
ks.dem.boot <- ks.boot(gdf$mean_align[gdf$Party == "Party: Democrat"],
        yg16$diet_mean_newspol[yg16$pid3 == "Democrat"])

ks.ind <- ks.test(gdf$mean_align[gdf$Party == "Party: Independent"],
        yg16$diet_mean_newspol[yg16$pid3 == "Independent"]) # Independent Distributions
ks.ind.boot <- ks.boot(gdf$mean_align[gdf$Party == "Party: Independent"],
        yg16$diet_mean_newspol[yg16$pid3 == "Independent"])

ks.rep <- ks.test(gdf$mean_align[gdf$Party == "Party: Republican"],
        yg16$diet_mean_newspol[yg16$pid3 == "Republican"]) # Republican Distributions
ks.rep.boot <- ks.boot(gdf$mean_align[gdf$Party == "Party: Republican"],
        yg16$diet_mean_newspol[yg16$pid3 == "Republican"])
```

[Writeup overlap coefficients, interpretation, KS tests and bootstrapped KS tests manually.]

```{r, echo = FALSE, results = 'asis'}
# Table 5: Regressions Using URL Section Proxy Data. 

gdf$newsintxpid <- interaction(gdf$Party, gdf$newsint)
greg <- lm(mean_align ~ Age + Race + Female + Income_Numeric + Educ + Party,
           data = gdf, 
           weights = gdf$weight)

greg2 <- lm(mean_align ~ Age + Race + Female + Income_Numeric + Educ + 
              Ideology,
           data = gdf, 
           weights = gdf$weight)

greg3 <- lm(mean_align ~ Age + Race + Female + Income_Numeric + Educ + Party + 
              Ideology,
           data = gdf, 
           weights = gdf$weight)
greg4 <- lm(mean_align ~ Age + Race + Female + Income_Numeric + Educ + Party + 
              Ideology + newsint,
           data = gdf, 
           weights = gdf$weight)

greg5 <- lm(mean_align ~ Age + Race + Female + Income_Numeric +
            Educ + Party + Ideology + newsint + newsintxpid,
            data = gdf, 
            weights = gdf$weight)

varnames <- c("Age: 30-44",
              "Age: 45-59",
              "Age: 60+",
              "Race: White",
              "Race: Black",
              "Race: Hispanic",
              "Female",
              "Income level",
              "High school",
              "Some college",
              "College graduate",
              "Postgraduate",
              "Democrat",
              "Republican",
              "Conservative",
              "Liberal",
              "Moderate",
              "Very conservative",
              "Very liberal",
              "Political Interest",
              "Democrat x Interest",
              "Republican x Interest")

# Output model results in stargazer table. 
stargazer(greg,
          greg2,
          greg3,
          greg4,
          greg5,
          style = "apsr", 
          title = "\\textbf{Determinants of Media Diet Slant (Using URL Section Proxies)}",
          dep.var.labels = "Average media diet slant (news/politics only)", 
          column.labels = c("(PID)", 
                            "(Ideo)",
                            "(Combined)",
                            "(Interest)", 
                            "(Party x Interest)"), 
          se = starprep(greg, greg2, greg3, greg4),
          p = starprep(greg, greg2, greg3, greg4, stat = "p.value"),
          omit = c("newsintxpidParty: Independent.1",
                   "newsintxpidParty: Democrat.1",  
                   "newsintxpidParty: Republican.1"),
          covariate.labels = varnames,
          omit.stat = c("f", "ser", "rsq"), column.sep.width = "0pt", 
          star.cutoffs = c(.05, .01),
          align = TRUE,
          notes = c("OLS regressions with HC2 robust standard errors in parentheses; YouGov survey data with weights applied."),
          add.lines = list("",
                           c("Party ID", "\\checkmark", "", "\\checkmark", "\\checkmark", "\\checkmark"),
                           c("Ideology", "", "\\checkmark", "\\checkmark", "\\checkmark", "\\checkmark"),
                           c("Political Interest", "", "", "", "\\checkmark", "\\checkmark")),
          notes.append = TRUE,
          header = FALSE,
          float = TRUE,
          table.placement = "H",
          font.size = "small")

gglm <- glm(mean_align ~ Age + Race + Female + Income_Numeric + Educ + Party,
            data = gdf, 
            weights = gdf$weight)

gglm2 <- glm(mean_align ~ Age + Race + Female + Income_Numeric + Educ + 
              Ideology,
             data = gdf, 
             weights = gdf$weight)

gglm3 <- glm(mean_align ~ Age + Race + Female + Income_Numeric + Educ + Party + 
              Ideology,
             data = gdf, 
             weights = gdf$weight)

gglm4 <- glm(mean_align ~ Age + Race + Female + Income_Numeric + Educ + Party + 
              Ideology + newsint,
             data = gdf, 
             weights = gdf$weight)

gglm5 <- glm(mean_align ~ Age + Race + Female + Income_Numeric +
             Educ + Party + Ideology + Party * newsint,
             data = gdf, 
             weights = gdf$weight)
greg5.2 <- lm(mean_align ~ Age + Race + Female + Income_Numeric +
             Educ + Party + Ideology + Party * newsint,
             data = gdf, 
             weights = gdf$weight)
```

```{r, echo = FALSE}
# MAUD/SMAUD/GIM Tests on New Data. 

# Compute MAUDs and SMAUDs for various model types. 
maud.tgi.1 <- maud.gen(starprep(greg)[[1]], 
                       summary(greg)$coefficients[,2])
maud.tgi.2 <- maud.gen(starprep(greg2)[[1]],
                       summary(greg2)$coefficients[,2])
maud.tgi.3 <- maud.gen(starprep(greg3)[[1]], 
                       summary(greg3)$coefficients[,2])
maud.tgi.4 <- maud.gen(starprep(greg4)[[1]],
                       summary(greg4)$coefficients[,2])
maud.tgi.5 <- maud.gen(starprep(greg5.2)[[1]],
                       summary(greg5.2)$coefficients[,2])
smaud.tgi.1 <- maud.gen(starprep(greg)[[1]], 
                        summary(greg)$coefficients[,2], 1)
smaud.tgi.2 <- maud.gen(starprep(greg2)[[1]],
                        summary(greg2)$coefficients[,2], 1)
smaud.tgi.3 <- maud.gen(starprep(greg3)[[1]], 
                        summary(greg3)$coefficients[,2], 1)
smaud.tgi.4 <- maud.gen(starprep(greg4)[[1]],
                        summary(greg4)$coefficients[,2], 1)
smaud.tgi.5 <- maud.gen(starprep(greg5.2)[[1]],
                        summary(greg5.2)$coefficients[,2], 1)
mauds.tgi <- c(maud.tgi.1, maud.tgi.2, maud.tgi.3,
               maud.tgi.4, maud.tgi.5)
smauds.tgi <- c(smaud.tgi.1, smaud.tgi.2, smaud.tgi.3,
                smaud.tgi.4, smaud.tgi.5)
avg.guess <- c(mean(mauds),
               mean(smauds))
avg.tgi <- c(mean(mauds.tgi),
             mean(smauds.tgi))
```

```{r, echo = FALSE, eval = FALSE}
# Compute GIM test and save (quite computationally intensive).
GIM.tgi.1 <- GIM(gglm, full = TRUE, B = 30, B2 = 25)
GIM.tgi.2 <- GIM(gglm2, full = TRUE, B = 30, B2 = 25)
GIM.tgi.3 <- GIM(gglm3, full = TRUE, B = 30, B2 = 25)
GIM.tgi.4 <- GIM(gglm4, full = TRUE, B = 30, B2 = 25)
GIM.tgi.5 <- GIM(gglm5, full = TRUE, B = 30, B2 = 25)
saveRDS(GIM.tgi.1, file = "input/TGI_Data/GIM_TGI_1.RData")
saveRDS(GIM.tgi.2, file = "input/TGI_Data/GIM_TGI_2.RData")
saveRDS(GIM.tgi.3, file = "input/TGI_Data/GIM_TGI_3.RData")
saveRDS(GIM.tgi.4, file = "input/TGI_Data/GIM_TGI_4.RData")
saveRDS(GIM.tgi.5, file = "input/TGI_Data/GIM_TGI_5.RData")
```

```{r, echo = FALSE}
# Figure 5. MAUD/SMAUD/GIM Tests on New Data. 

# Load pre-computed GIM Tests. 
GIM.tgi.1 <- readRDS("input/TGI_Data/GIM_TGI_1.RData")
GIM.tgi.2 <- readRDS("input/TGI_Data/GIM_TGI_2.RData")
GIM.tgi.3 <- readRDS("input/TGI_Data/GIM_TGI_3.RData")
GIM.tgi.4 <- readRDS("input/TGI_Data/GIM_TGI_4.RData")
GIM.tgi.5 <- readRDS("input/TGI_Data/GIM_TGI_5.RData")

# MAUD/SMAUD subfigures. 
unc.dat <- data.frame(maud = mauds.tgi, 
                      smaud = smauds.tgi, 
                      rot = c(GIM.tgi.1$`Rule of Thumb`,
                             GIM.tgi.2$`Rule of Thumb`,
                             GIM.tgi.3$`Rule of Thumb`,
                             GIM.tgi.4$`Rule of Thumb`,
                             GIM.tgi.5$`Rule of Thumb`),
                      tstat = c(GIM.tgi.1$`GIM test statistic`,
                                GIM.tgi.2$`GIM test statistic`,
                                GIM.tgi.3$`GIM test statistic`,
                                GIM.tgi.4$`GIM test statistic`,
                                GIM.tgi.5$`GIM test statistic`),
                      model = c("PID", "Ideo", "Combined", "Interest", "PID x Interest"))

m.plt <- unc.dat %>% 
  ggplot(aes(x = maud, y = model, label = model)) +
  geom_point(size = 3, shape = 24, fill = "blue", color = "darkred") +
  geom_vline(xintercept = 0, lty = "dotted") + 
  labs(x = "MAUD",
       y = "Model", 
       title = "MAUDs by Model Specification") + 
  theme_pubr()

sm.plt <- unc.dat %>% 
  ggplot(aes(x = smaud, y = model, label = model)) +
  geom_point(size = 3, shape = 23, fill = "blue", color = "darkred") +
  geom_vline(xintercept = 0, lty = "dotted") + 
  labs(x = "SMAUD",
       y = "Model", 
       title = "SMAUDs by Model Specification") + 
  theme_pubr()
  
# GIM statistics subfigures. 
rot.plt <- unc.dat %>% 
  ggplot(aes(x = rot, y = model, label = model)) +
  geom_point(size = 3, shape = 22, fill = "blue", color = "darkred") +
  geom_vline(xintercept = 0, lty = "dotted") + 
  labs(x = "(GIM) Rule of Thumb",
       y = "Model", 
       title = "(GIM) Rules of Thumb by Model Specification") + 
  theme_pubr()

tstat.plt <- unc.dat %>% 
  ggplot(aes(x = rot, y = model, label = model)) +
  geom_point(size = 3, shape = 21, fill = "blue", color = "darkred") +
  geom_vline(xintercept = 0, lty = "dotted") + 
  labs(x = "(GIM) Rule of Thumb",
       y = "Model", 
       title = "(GIM) Test Statistics by Model Specification") + 
  theme_pubr()

# Arrange, save, and output plot. 
plt.2 <- ggarrange(m.plt,
                   sm.plt,
                   rot.plt,
                   tstat.plt,
                   nrow = 4) %>% 
  annotate_figure(top = text_grob("Figure 5. Model Dependence by Specification (URL Proxies)",
                  face = "bold",
                  size = 14))
ggsave("output/figures/figure_5.pdf", device = "pdf", width = 8, height = 10)
```
```{r, echo = F, fig.show="hold", out.width="100%", fig.align="center"}
# Load Figure 1 into the RMarkdown document. 
knitr::include_graphics("output/figures/figure_5.pdf")
```

[TODO: Results do not replicate those of Guess (2021). Either classification strategies are measuring fundamentally different concepts or there is so much heterogeneity in individual news preferences that different samples generate significantly different results, and thus future studies of individual need extremely large $n$ (i.e., perhaps around 10,000 or more randomly sampled respondents).]

## Let's Get LOCO: What's Driving These Disparities in Results? 

```{r, echo=FALSE, waring = FALSE, message = FALSE}
# Figures 6/7: LOCO Analysis of Variable Importance in Guess (2021) and 
# Tyler, Grimmer, and Iyengar (2021).
# Sequester and reshape Guess (2021) data for LOCO prediction models. 
oldw <- getOption("warn") # Suppress annoying warnings from conformalInference R package. Be careful when running this code! It changes your global warnings settings to turn off all warnings. Make sure to run the bottom of this chunk or execute "options(warn = oldw)" manually in order to return your options to your original settings. 
options(warn = -1)

yg16.loco <- yg16 %>% 
  select(diet_mean_newspol,
         ideo5, 
         educ,
         gender,
         race,
         age,
         income,
         pid3,
         pol.class) %>% 
  drop_na() 

# Convert factors to integers.
yg16.loco$ideo5 <- yg16.loco$ideo5 %>% 
  factor(levels = yg16$ideo5 %>%  levels()) %>%
  as.integer()
yg16.loco$educ <- yg16.loco$educ %>% 
  factor(levels = yg16$educ %>% levels()) %>% 
  as.integer()
yg16.loco$gender <- yg16.loco$gender %>% 
  factor(levels = yg16$gender %>% levels()) %>% 
  as.integer()
yg16.loco$race <- yg16.loco$race %>% 
  factor(levels = yg16$race %>% levels()) %>% 
  as.integer()
yg16.loco$income <- yg16.loco$income %>% 
  factor(levels = unique(yg16$income %>% na.omit())) %>% 
  as.integer()
yg16.loco$pid3 <- yg16.loco$pid3 %>% 
  factor(levels = yg16$pid3 %>% levels()) %>% 
  as.integer()
yg16.loco$age <- yg16.loco$age %>% 
  factor(levels = unique(yg16$age)) %>% 
  as.integer()
yg16.loco$pol.class <- yg16.loco$pol.class %>% 
  factor(levels = yg16$pol.class %>% levels()) %>% 
  as.integer()

# Convert integer dataframe to numeric.
yg16.loco[] <- lapply(yg16.loco, as.numeric)


# Estimate LOCO predictions (one shot). 
set.seed(12345)
y <- yg16.loco$diet_mean_newspol
X <- yg16.loco[,-1]
my.lasso.funs <- lasso.funs(nlambda = 100, cv=TRUE, cv.rule = "1se")

fit.loco <- loco(X, y,
                 alpha=0.1, 
                 train.fun = my.lasso.funs$train,
                 predict.fun = my.lasso.funs$predict, 
                 active.fun = my.lasso.funs$active.fun)

# Capture LOCO results in plot.
mar = c(4.25,4.25,3.25,1)
w = 8
h = 10

pdf(file="output/figures/figure_6.pdf",w=w,h=h)
par(mfrow = c(2, 2))
ylim = range(fit.loco$inf.wilcox[[1]][,2:3])
J = length(fit.loco$active[[1]])
plot(c(), c(), xlim=c(1,J), ylim=ylim, xaxt="n",
     main="Figure 6. \n LOCO Analysis from Lasso + CV Model \n (Guess 2021 Models/Data)",
     xlab="Variable", ylab="Confidence interval")
axis(side=1, at=1:J, labels=FALSE)
for (j in 1:J) {
  axis(side=1, at=j, labels=fit.loco$active[[1]][j], cex.axis=0.75,
       line=0.5*j%%2)
}
abline(h=0)
segments(1:J, fit.loco$inf.wilcox[[1]][,2],
         1:J, fit.loco$inf.wilcox[[1]][,3],
         col="red", lwd=2)

# With forward stepwise. 
my.step.funs <- lars.funs(type = "step", max.steps = 50, cv = TRUE, cv.rule = "1se")
fit.steploco <- loco(X, y, 
                     alpha = 0.1, 
                     train.fun = my.step.funs$train,
                     predict.fun = my.step.funs$predict,
                     active.fun = my.step.funs$active.fun)

# Capture stepwise LOCO results in plot. 
ylim = range(fit.steploco$inf.wilcox[[1]][,2:3])
J = length(fit.steploco$active[[1]])
plot(c(), c(), xlim=c(1,J), ylim=ylim, xaxt="n",
     main="LOCO Analysis from Stepwise + CV Model \n (Guess 2021 Models/Data)",
     xlab="Variable", ylab="Confidence interval")
axis(side=1, at=1:J, labels=FALSE)
for (j in 1:J) {
  axis(side=1, at=j, labels=fit.steploco$active[[1]][j], cex.axis=0.75,
       line=0.5*j%%2)
}
abline(h=0)
segments(1:J, fit.steploco$inf.wilcox[[1]][,2],
         1:J, fit.steploco$inf.wilcox[[1]][,3],
         col="red", lwd=2)

# Sequester and reshape Tyler, Grimmer, & Iyengar (2021) data for LOCO prediction 
# models. 
gdf.loco <- gdf %>% 
  select(mean_align,
         Ideology_Numeric,
         Age,
         Race,
         Female,
         Income,
         pid,
         newsint) %>% 
  drop_na()

# Convert factors to integers. 
gdf.loco$Age <- gdf.loco$Age %>% as.integer()
gdf.loco$Race <- gdf.loco$Race %>% as.integer()
gdf.loco$Income <- gdf.loco$Income %>% as.integer()

# Convert integer dataframe to numeric.
gdf.loco[] <- lapply(gdf.loco, as.numeric)

# Estimate LOCO predictions (one shot). 
set.seed(12345)
y <- gdf.loco$mean_align
X <- gdf.loco[,-1]
my.lasso.funs <- lasso.funs(nlambda = 100, cv=TRUE, cv.rule = "1se")
fit.loco <- loco(X, y,
                 alpha=0.1, 
                 train.fun = my.lasso.funs$train,
                 predict.fun = my.lasso.funs$predict, 
                 active.fun = my.lasso.funs$active.fun)

# Capture LOCO results in plot.
ylim = range(fit.loco$inf.wilcox[[1]][,2:3])
J = length(fit.loco$active[[1]])
plot(c(), c(), xlim=c(1,J), ylim=ylim, xaxt="n",
     main="Figure 7. \n LOCO Analysis from Lasso + CV Model \n (TGI 2021 Models/Data)",
     xlab="Variable", ylab="Confidence interval")
axis(side=1, at=1:J, labels=FALSE)
for (j in 1:J) {
  axis(side=1, at=j, labels=fit.loco$active[[1]][j], cex.axis=0.75,
       line=0.5*j%%2)
}
abline(h=0)
segments(1:J, fit.loco$inf.wilcox[[1]][,2],
         1:J, fit.loco$inf.wilcox[[1]][,3],
         col="red", lwd=2)

# With forward stepwise. 
my.step.funs <- lars.funs(type = "step", max.steps = 50, cv = TRUE, cv.rule = "1se")
fit.steploco <- loco(X, y, 
                     alpha = 0.1, 
                     train.fun = my.step.funs$train,
                     predict.fun = my.step.funs$predict,
                     active.fun = my.step.funs$active.fun)

# Capture stepwise LOCO results in plot. 
ylim = range(fit.steploco$inf.wilcox[[1]][,2:3])
J = length(fit.steploco$active[[1]])
plot(c(), c(), xlim=c(1,J), ylim=ylim, xaxt="n",
     main="LOCO Analysis from Stepwise + CV Model \n (TGI 2021 Models/Data)",
     xlab="Variable", ylab="Confidence interval")
axis(side=1, at=1:J, labels=FALSE)
for (j in 1:J) {
  axis(side=1, at=j, labels=fit.steploco$active[[1]][j], cex.axis=0.75,
       line=0.5*j%%2)
}
abline(h=0)
segments(1:J, fit.steploco$inf.wilcox[[1]][,2],
         1:J, fit.steploco$inf.wilcox[[1]][,3],
         col="red", lwd=2)
graphics.off()
options(warn = oldw)
```
```{r, echo = FALSE, echo = F, fig.show="hold", out.width="100%", fig.align="center"}
# Output figures 6/7. 
knitr::include_graphics("output/figures/figure_6.pdf")
```

[LOCO: Leave-One-Covariate-Out Inference]

# Discussion and Conclusion
















